The D2Q2 Framework: On the Relationship and Combination of Language Modelling and TF-IDF

نویسندگان

  • Thomas Roelleke
  • Hany Azzam
  • Marco Bonzanini
  • Miguel Martinez-Alvarez
  • Mounia Lalmas
چکیده

Language Modelling (LM) and TF-IDF are two retrieval models with different foundations. There have been efforts aiming at establishing the relationship between these models, and whether one includes the other. Whether their combination could yield a third and better model is an open research question. This paper revisits the foundations of LM and TF-IDF and explores how these models’ bare structures relate and how these structures can be combined. We begin with the premise that TF-IDF is the P (d|q)/P (d) side of retrieval, which complements the common view that LM is P (q|d)/P (q). Next, a hybrid framework based on the decomposition of the product of the two sides, P (d|q)/P (d) · P (q|d)/P (q), is developed. This leads to the D2Q2 family of models, which joins the inner components of LM and TF-IDF instead of combining their scores. This paper provides new insights into the relationship between LM and TF-IDF, and experimental results show that the D2Q2 models perform comparably to competitive baselines.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

طراحی سامانه هوشمند ساخت هستان نگار به کمک شبکه عصبی ARTو روشC-value

In recent years, many efforts have been done to design ontology learning methods and automate ontology construction process. The ontology construction process is a time-consuming and costly procedure for almost all domains/applications, so automating this process is a solution to overcome the knowledge acquisition bottleneck in information systems and reduce the construction cost. In this artic...

متن کامل

An Attribute-based Model for Semantic Retrieval

This paper introduces a knowledge-oriented approach for modelling semantic search. The modelling approach represents both semantic and textual data in one unifying framework, referred to as the probabilistic object-relational content modelling framework. The framework facilitates the transformation of “term-only” retrieval models into “semantic-aware” retrieval models that consist of semantic p...

متن کامل

Probabilistic retrieval models : relationships, context-specific application, selection and implementation

Retrieval models are the core components of information retrieval systems, which guide the document and query representations, as well as the document ranking schemes. TF-IDF, binary independence retrieval (BIR) model and language modelling (LM) are three of the most influential contemporary models due to their stability and performance. The BIR model and LM have probabilistic theory as their b...

متن کامل

Language Models, Smoothing, and IDF Weighting

In this paper, we investigate the relationship between smoothing in language models and idf weights. Language models regard the relative within-document-frequency and the relative collection frequency; idf weights are very similar to the latter, but yield higher weights for rare terms. Regarding the correlation between the language model parameters and relevance for two test collections, we fin...

متن کامل

Perceptual Learning Style Preferences and Computer-Assisted Writing Achievement within the Activity Theory Framework

Learning styles are considered among the significant factors that aid instructors in deciding how well their students learn a second or foreign language (Oxford, 2003). Although this issue has been accepted broadly in educational psychology,further research is required to examine the relationship between learning styles and language learning skills. Thus, the present study was carried out to in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013